Skip to main content

8. Pipeline Infrastructure and Cloud Costs

Data pipelines are run in the customers' cloud environment. Customer data in movement (copying from source) or at rest (in the destination storage) stays within their IT environment at all times. Customers directly pay their cloud platform providers (AWS, Azure or GCP) for cloud infrastructure.

For pipeline execution and support, DataStori charges a fixed monthly fee per application instance.

DataStori manages data pipelines from its cloud. It runs pipelines using Azure Container Instances or AWS Fargate to keep running and maintenance costs under control while using leading cloud platforms.

Pipeline Execution

DataStori orchestrates, schedules, manages and provides code for the data pipelines. For every pipeline run, DataStori submits a job to a queue. The customer's cloud reads the job queue and in turn spins up servers for pipeline execution.

These servers spin up on demand, execute pipelines to ingest data and are then killed. Any data or storage associated with these servers is cleaned at the end of the pipeline run. Ingested data is stored in the customer's cloud (AWS S3, Azure Blob or GCS).

By default, DataStori spins up execution servers with 4GB RAM and 16 GB storage. Server size can be increased based on customers' data and performance requirements.

Execution capacity, i.e., the number of concurrent pipeline runs allowed, depends on the quotas and limits set by your organization.

Infrastructure Security Policies

AWS - We can setup pipeline execution in any Virtual Private Cloud and subnet of the customer's choice. Source applications and DataStori execution are isolated from each other.

Azure - Pipelines can be set up in any Resource Group and the permissions allowed to DataStori executors can be tightly controlled.

tip

Please write to contact@datastori.io for assistance on sizing and setting up your cloud infrastructure.